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Preface 


This notation system provides a set of characters by which texual material, 
particularly the contents of scientific and technical texts, can be represented. 
The system has been initially designed to accommodate the complexities of the 
texts of United States patents. Symbology has been devised to preserve details 
which we believe will eventually be needed from patent texts. Conversely, tech- 
niques have been incorporated in the notation system for producing uniformity 
in certain sectors of patent text where such uniformity appeared preferable. 
These diverse demands of detail and uniformity have apparently not been re- 


quired in the projects described in items of the bibliography. 


While it is planned to use the transliterated patent corpus for studies of re- 
trieval of information from raw text, the corpus will first be used by Kenneth 
Knowlton to create a record tape of a format which can be utilized in linguistic 
analysis programs. Since similar work has been done by MIT with other cor- 
pora, existing programs are serving as springboards for the creation of newer 


programs. These are described in Section 5. 


The notation system is amenable to alteration to suit purposes other than that 
of its first use, including transliterations of non-patent, scientific and technical 
texts. The system will undoubtedly be changed and enlarged, foreseeable addi- 
tions including procedures and nomenclature for representing the elements of 


mathematical and chemical formulas. 


The undersigned is indebted to his co-authors--to Ken Knowlton who established 
the basics of the system and to Rowena Swanson who weeded, amplified, and 
edited. Thanks are also due to Dr. Victor H. Yngve of MIT and to Don D. 
Andrews, Director of the Office of Research and Development of the U. S. Patent 


Office, for helpful suggestions. 


Simon M. Newman 
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A NOTATION SYSTEM FOR TRANSLITERATING TECHNICAL AND 
SCIENTIFIC TEXTS FOR USE IN DATA PROCESSING SYSTEMS 


1.0 INTRODUCTION 


This report provides-- 


a. the codes and instructions for punching (and verifying) certain portions of the headings and all of 
the specifications, claims, and cited references which are contained in the texts of U. S. patents, and 

b. a comparison of the punched card format with the IBM 704 tape output format. 

The notation system which has been devised for preparing the punched card text is machine oriented 
to simplify the program for creating a magnetic tape from the card deck. This imposes a greater loadon 
the card puncher since mnemonic codes and other memory aids could but infrequently be incorporated. 
The IBM 704 tape output is user oriented and is designed to be readable by many who are not familiar 
with the notation system. 

The codes in Sections 2 and 3 are expressed in IBM 024 alpha-numeric characters since the IBM 024 
Card Punch is being used in the punching operation. Since the Fortran Punch is normally used in con- 
junction with the IBM 704 in which the punched cards will be processed, the characters of both the IBM 
024 and Fortran Punches are listed inthe Symbol Dictionary of Section 4, and the Fortran characters, 
exclusively, are referred to in Section 5 relating to the IBM 704 programs. 

The punched text, the magnetic tape, and the programs for creating the tape and for using it to locate 
and list individual words and groups of words in their context will be used for research in language and 
in patent searching at our Office of Research and Development, at the National Bureau of Standards, and 
at the Massachusetts Institute of Technology. 

Every effort will be made to make available copies of the card deck, the IBM 704 program decks, and 
the IBM 704 tape at cost to bona fide researchers. 


2.0 SYMBOLOGY FOR TECHNICAL TEXT TYPOGRAPHY 


2.1 The codes are expressed in terms of the characters of the standard IBM 024 alpha-numeric Card 
Punch. . The IBM 024 characters are the same as those of the Fortran Punch which is normally used in 
Conjunction with the IBM 704, with the exception of five characters. The punch patterns for these char- 
acters and their designations in IBM 024 and Fortran symbology are given in Table 1. 


Table 1.—Character Equivalents 


ST’D IBM IBM 704-FORTRAN 
024 SYMBOL SYMBOL 


2.2 The card format is as follows: 


Columns 1-72 are for the patent text. 
Column 73 is left blank. 


Columns 74-80 will be used later for consecutively numbering the cards. These columns are allocated 
as shown in Table2. Column 80is left blank so that a card, later made and inserted between two existing 


cards, can be given a decimal designation. 


Col. | 74} 75 | 76} 77} 78} 79 

Thus card 18 is punched Uns 
card 3,274 is punched Se 2 Li 34 

and card 54,875 is punched Si 4 8 7.13 


Table 2.--Card Numbering Format 


78 79 80 


10's units blank 


100,000's 


A new card is begun for each document. Column 1 of each following card contains whatever text con- 
secutively follows column 72 of the immediately preceding card; e.g.: 
(1) A word which does not end in column 72 continues in column 1 of the next card: 


1 is 
THE OPERATI 


ONMGATSHED PUNCH etsvcpetelsisis\e(e/s «sl» cicle see ave 
(2) If a word or sentence ends in column 72, a space is left in column 1 of the next card: 


72 


THE OPERATION 


DAL LEO) PUNOsta O46 po CoO OOBOO ODE Eee IRIE 
(3) An abbreviation which does not end in column 72 continues in column 1 of the next card: 
72 
| 


THE END IS ETC 


2.3 Letters 
Letters in lower case are punched normally, e.g.: 


small is punched SMALL 


Letters in upper case are introduced by anasterisk (*), a * being punched in front of each upper case 
letter, e.g.: 


Reissue is punched "REISSUE 
point A is punched POINT *A 


ae 


Exception 1: A word or words in the text which are entirely capitalized are introduced by the begin 
capitalized words symbol 


haha A 


The letters of the capitalized word or words are punched as lower case letters and the word or words are 
followed by the end capitalized words symbol 


#477 
Thus, the text 
The NEEDLE VALVE is emphasized. 
is punched 
“THE “*@NEEDLE VALVE**J{ 1S EMPHASIZED. 


Exception 2: The complete title of the document and subtitles in the text are punched in lower case 
letters regardless of how they appear in the document. [See Sections 3.1; 3.4 ]. 


2.4 All Arabic numbers are punched normally, regardless of the type in which they appear in the text. 
Exception: When a phrase or sentence is in a form of type other than normal type, numbers which 
are included in the phrase or sentence are in the type in which they appear in the phrase or sentence. 
Thus, numbers in bold face in normal text are punched normally, e.g.: 
Apertures between needle valve 101 and port 102 
is punched 
*“APERTURES BETWEEN NEEDLE VALVE 101 AND PORT 102 
Numbers in italics within a phrase in italics are punched as being in italics, e.g.: 
In example 2, the number is in the phrase. 
is punched 
“#* IN EXAMPLE 2*$, THE NUMBER IS IN THE PHRASE. 
[See Section 2.91]. 
2.5 When a Roman numeral occurs, it is introduced by the begin Roman numeral symbol 
“Uf 
the equivalent Arabic numeral is punched, and the numeral is followed by the end Roman numeral symbol 


* 
? 


as follows: 
the post XII is 
is punched 


THE! ROSTNS 722os sols 


2.6 A space inthetextappears asa Space (no punch) in the punched card. With few exceptions, NO space 
appears in the punched card except in correspondence with a space in the text. Thus, the text 


Word is separated from word by space, and numbers 1/2, 21, and 4,185 are separated as 
shown. Sentence is separated from sentence by single space. 


is punched 
72 


1 
*WORD IS SEPARATED FROM WORD BY SPACE, ANDO NUMBERS 1/2, 21, AND 4,185 AR 


— SEPARATED AS SHOWN. “SENTENCE IS SEPARATED FROM SENTENCE BY SINGLE SPA 


CE. 


Note that a punctuation mark may be part of a word as well as a device for separating words. In the 
above illustration, a space follows the commas separating the complete numbers 1/2 and 21, but there is 
no space in 4,185 because the comma is part of the single whole number. 

Also note that a Single space may be “punched” after the period ending a sentence, whereas two spaces 
frequently appear in printed text. 

A “space” consists of any number of spaces from 1 through 71, as long as no other character inter- 
venes. However, a “space” consisting of 720r more consecutive spaces signifies the end of a document. 
[See Section 3.8 . 


2.7 The period performs a variety of functions in text. These functions are separately noted in the 
punched text as follows: 


2.7.1 The period at the end of a sentence is punched immediately after the last letter of the last word 
without a space and it is followed by a space, e.g.: 


The machine stops. Begin next sentence. 
is punched 
*THE MACHINE STOPS. “BEGIN NEXT SENTENCE. 


2.7.2 The period as a decimal point is punched between the two appropriate digits of the number with 
no space before or after it, e.g.: 


103.50 is punched 103.50 


2.7.3 The period at the end of an abbreviation is represented by two asterisks punched between the 
last letter of the abbreviation and the period without spaces, e.g.: 


etc. is punched ETC=s 
2.7.4 For the period at the end of an abbreviation which is the last word of a sentence, an indication 
of both functions of the single period is necessary. Thus, although only one periad occurs in the text, 4 
period for the end of the abbreviation and a period for the end of the sentence are both punched without 
any intermediate space, e.g.: 
The last word is etc. 
is punched 


SLU EAS NORDs Ome Gran. « 


2.7.5 The period after a number which beginsa paragraph is treated as the period of an abbreviation; 
e.g., the following text appearing in column 11 of patent 2,709,339 


at line 19 We claim: 


at line 20 1. Ina two stage 
at line 51 2. Ina two stage 
is punched 


*11@19 *WE CLAIM**C 
*11@20 1**. *IN A TWO STAGE 
“11@51 2**. *IN A TWO STAGE 
(See Section 3.5 for patent paragraph and patent claim notation symbology }. 


2.7.6 Three successive periods (or asterisks) which signify an ellipsis (an omission of words from 
the text) are represented by the ellipsis symbol 


**LH 


When the ellipsis occurs at the end ofa sentence, the ellipsis and the end of the sentence are both in- 
dicated, as follows: 


**H., 
Thus the text 
a, b, c,***n equals u, V, W,.... 
is punched 
A, B, C,**HN EQUALS U, V, W,**H. 
2.8 Bold-face type is introduced by the begin bold-face type symbol 
baat: 3 
and is followed by the end bold-face type symbol 
#95 
In the following illustration, the word BOLD appeared in bold face in the text 
*THE **#BOLD**$ LETTERS STAND OUT. 
[See Section 2.4 concerning numerals in bold face ]. 


2.9 Letters or words in italics or letters or words which are underscored are preceded by the begin 
ttalics symbol 


“+ 
and are followed by the end italics symbol 
*$ 


One letter, one or more words, a whole sentence, or a whole paragraph may be in italics or may be 
underscored. The begin italics and end italics symbols are punched at the beginning and end, respec- 
tively, of each consecutive grouping of italicized or underscored letters (the letter, word, sentence, or 
paragraph). Insertion of a word not in italics or not underscored ends the italicized or underscored 


ae 
See Section 2.4 concerning numerals in italics ]. 


Ex. 1—The phrase “the need is great” is punched 


THE *#NEED*$ IS GREAT When only the word need is in italics 
THE *#NEED*$ IS *#GREAT*$ When both the words need and grea are in italics 
*ZTHE NEED IS GREAT*$ When the entire phrase is in italics 


Ex. 2—snow, not sow 
is punched 
S*#N*$OW, NOT SOW 


2.10 Quotation marks are replaced by thebegin quotation symbol **Q and the close quotation symbol **U 
at the beginning and end of a quotation, respectively, e.g.: 


“This sentence is in quotes.” 
is punched 
**Q*THIS SENTENCE IS IN QUOTES**U. 


When a close quotation mark occurs at the end of a sentence, printers usually put the punctuation in- 
side the quotation mark, thus: 


“...to the end.” 


In punching, it is necessary to punch the close quotation mark before the punctuation mark, as though 
the text read 


“...to the end”. 
so that the punctuation mark will occur at the end of the sentence. 


2.11 The same symbol is used for the prime mark, the apostrophe, and the single quote mark. The 
prime-and-apostrophe symbol is 


+A 


Double and triple primes are represented by the appropriate repetition of the prime-and-apostrophe 
symbol, as follows: 


single prime AYN 
double prime ASSAY 
triple prime Se ASSASEA 
The prime-and-apostrophe symbol is applied to single quotation marks which both begin and close the 


quoted text. 
The following examples are illustrative. 
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Ex. 1—the metal’s expansion characteristics 
is punched 
THE METAL**AS EXPANSION CHARACTERISTICS 
Ex. 2—the area of port 94' is 
is punched 
THE AREA OF PORT 94**A IS 
Ex. 3—the square a'B"'c'''D 
is punched 
THE SQUARE A**A*B**A**AC**A**A**A*D 
Ex. 4—the “pump handle ‘A’ ” 
is punched 
THE **QPUMP HANDLE **A*A**A**U 


2.12 Letters or numbers frequently appear in technical texts in superscript or in subscript. 
The superscript is introduced by the superscript symbol *& as follows: 


3 a 
arm’, lever®, 10° 3 or shaft B 
is punched 
ARM*&8, LEVER*&*B, 10*&5*&.*&3, OR SHAFT*&A*&*B 


The superscript symbol followed by the numeral zero is used to designate the degree or temperature 
mark or the angle mark, e.g.: 


water at 23°C is punched WATER AT 23*&0*C 
The subscript is introduced by the subscript symbol *@ as follows: 


cam 


A? H,0, Hp Feo 3) or line, 


is punched 
CAM*@*A, *H*@2*0, *H*@F, *FE*@2*@@*@3, OR LINt~@*B*@A 
2.13 Letters of the Greek alphabet are introduced by the following symbols: 
lower case Greek letter symbol LANG 
upper case Greek letter symbol NTL 
The appropriate Greek letter symbol is punched preceding each Greek letter which occurs in the text. 
Normal Roman letters in lower case type are substituted for the Greek letters in punching. (This 


eer ucon is called transliteration). The Greek letters and their Roman letter equivalents are given 
able 3. 
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Exemplary contexts including Greek letters are punched as follows: 


20r is punched 2**YPR 

2 Arr is punched DE=7D*=YPR 

a numeric is punched **YA NUMERIC 

L626 is punched **7S**Y J*&2"*YS*@*"YR 


Pp 


Table 3.—Greek Letters and Equivalents 


Greek Letters 
Word Equivalent Roman Letter 


Upper Case Lower Case Equivalent 


> AK T7TMmMNMF 1D YS 


Dp «ex @ BW MwA O 


A 

B 

G 

D 

E 

Z 

H 

J 

I 

K 

L 

M M 
N N 
3 Xx 
oO 

P 

R 

S 

T 

U 

F 

Cc 

Y 

Q 
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2.14 For ready reference, IBM 024 and Fortran notations for punctuation marks and other symbols, in- 
cluding the notations discussed in the foregoing subsections and notations not heretofore presented, are 


summarized in Table 4. 


Table 4.--Notations for Typographic Marks 


ST'D IBM IBM 704-FORTRAN 
MARK 024 SYMBOL SYMBOL 


(period) See Section 2.7 

; (comma) A ’ 

( (begin parentheses) % ( 

) (close parentheses) jr ) 

+ (plus) & ae 

- (hyphen) } @ f 

—- (long dash) 

= (equals) # = 

$ (dollar sign) $ 

/ (fraction mark) 

(and/or) / / 

L (begin bracket) *%, =( 

| (close bracket) AT ( *) 

J (apostrophe) 

(prime) +A **A 
(single quote) 

% (per cent) 4K **K 
(colon) SEC **C¢ 
(question mark) : eT wal 

Z (angle) *4], bch bs 
(multiplication sign) **M **M 

uw (quotes) See Section 2.10 

: (semicolon) **S AS 
peeeree prouorne), 1) atest “+0 

{ (direction sign) hay eV, 
(direction sign) WwW **W 

! (exclamation point) 4X **X 

—— (direction sign) ba shar 

— (direction sign) ae, eat) 

+ (division mark) / i/ 

superscript *& *} 

subscript *@ = 
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2.15 When a mark, symbol, or other situation is encountered which is not accommodated by any of the 
foregoing provisions, the patent text copy is marked in red and the symbol 


**B 
is punched to represent the omitted portion of the patent text. 


2.16 When an error is made in the punched text, the card in which the error occurred is repunched. 
When the error includes an omission from the text, a card or cards are inserted to contain the overflow 
text. 

When the last non-space of inserted text is in the middle of a text word, or when the total number of 
blanks between the last non-space on the insert card and the first non-space on the following card is 
72 or greater, the cancel space symbol 


* space 


is punched immediately after the inserted text. This symbol serves to cancel the * and all spaces after 
the * so that the text with the words and spacing shown in the document is preserved. 

The procedure for correcting errors is illustrated in the following examples. The text from which 
the examples are taken reads as follows: 


Another object of the inventionis toprovidea piston pump capable of operating 
at high temperatures and having improved piston expansion characteristics to 
reduce loss of volumetric efficiency. More specifically, the invention aims to 
provide a pump having the ability to compensate... 


Ex. 1—The following was punched: 


1 Thy 
cece cenes Meleisisisteisietaleietelei-leielo CAPABLE OF 
OPERATING... cee eee eee e vc crceevevvee COMP 


Since the F of OF occurred incolumn72, there should have been a space in column | of the next card be- 
fore the word OPERATING. The error is corrected as follows: 


1 72 

Socabdod Metetetleletelstetelsieiaelsi-iareie CA RABE a! (unchanged) 
* (insert card) 

ODER ATHIN Gapitrel-icteistelclelsielslerolevelsicle/s'ejele/s COMP (unchanged) 


The insert card provides a space in column 1. Since this is the only omission, the * is punched in column 
2 with no punches in the rest of the card. Therefore, only the space in column 1 of the insert card be- 
comes part of the punched text. 


Ex. 2—The following was punched: 


1 72 

EqiRUlOstteiisiaicierlaccleleleteleiate MRE OP EGIh CA 

LLY, THE INVENTION.......+++++-COMPENSA 
=i) 5 


The letter O was omitted from the word *MORE. The error is corrected as follows: 


1 72 

Pe no CAGE R OE OT OnIO *MORE SPECIFIC (new card) 

Af (insert card) 
ENRYpeeTHE UNVENTUON 6 6:00 00 ccc esses COMPENSA (unchanged) 


A new card is punched with the typographical error corrected. Since the error involved an omission, 
the text which overflows the new card is punched into the insert card. Since the last letter punched 
into the insert card is in the middle of a text word, the * is punched immediately after the letter and no 
further punches are made in the insert card. The punched text now mirrors the document text. 


Ex. 3--The following was punched: 


[ 72 
WIS OF od dg ecauncaounoe *MORE SPECIFICY, 
ESR NVIEWNIT ON. cise 6 0's a seis cc 6 crcle COMPENSA 


The letters ALL were left out of the word SPECIFICALLY. The error is corrected as follows: 


1 72 
bese Oletererereva(s wel svacen *MORE SPECIFICALL (new card) 
V5 {insert card) 
STIRIGMMTINV.EINTINON'sie)s, se «ie «eave apieleree COMPENSA (unchanged) 


A new card is punched with the typographical error corrected. Since the error involved an omission, the 
text which overflows the new cardis punched into the insert card. In comparison with Example 2 above, 
the last mark punched into the insert card in Example 3 is at the end of a text word. Since a “space” 
may consist of upto 71 blanks ina card [see Section 2.6] and only 70 blanks remain to the next non-space, 
no additional symbology is required to be punched into the insert card. 


Ex. 4—The following was punched: 


1 72 
EMICs occ oc ODD OMA O CONDE *MOORE SPECIFI 
CALLY, THE INVENTION........ ) »COMPENSA 


An extra letter O was punched in the word *MORE. The error is corrected as follows: 


[ 72 
aT Octeleteieiclsioisicic/sisiciclele(s|s ee MORE SPEGIEL (new card) 
CALLY,: THE INVENTION........00- COMPENSA (unchanged) 
Since a letter was removed from the punched text, an additional space to take the place of this letter is 


left in the new card. No insert card is necessary. In the above example, an additional space was left 
before the word *MORE. [See Section 2.6 on the definition of “space”]. 


15 


Ex. 5—The following was punched: 


| 72 
SANOTHERS © caivinie ss oe cee sees PISTON CAPBLE 
OF OPERATING. .. eee e reece crease veee COMP 


The word PUMP was omitted from the text and the letter A was omitted from the word CAPABLE. The 
error is corrected as follows: 


i 72 

SHANG ER fete te reloles cre leisieteisrersis PISTON PUMP | (new card) 
APABLE (insert card) 
OBBORERAMAING seicigcvstcicers\sitl aise e'sia'c.e'e'e COMP (unchanged) 


A new card replaces the erroneous one. The overflow text is punched into the insert card. Since the 
total number of spaces from the last non-space on the insert card (which is the end of a word) to the next 
non-space on the following card is less than 71, no additional symbology is required on the insert card. 
2.17 The following symbols are specifically excluded: 

# 

& 

Notations will be designated for these symbols if they are needed. 

[See Section 2.15 for other symbols not provided for]. 


3.0 SYMBOLOGY PECULIAR TO PATENT TEXT 


3.1 The following information is extracted from the patent heading in the following order: 


(1) The patent number 
(2) The full title of the invention 
(3) The full name(s) of the inventor(s) 


The patent number (1) needs no comment. 

3.1.1 The title of the invention (2), while often printed in upper case on the patent, is punched entirely 
as if it were in lower case. This procedure applies to every letter in the title including the first letter 
of the first word of the titleand the first letter of a proper name or geographic location in the title, e.g.: 
Ex. 1—Control Mechanism for Pump 

is punched 
CONTROL MECHANISM FOR PUMP 
Ex. 2-CONTROL MECHANISM FOR PUMP 
is punched 


CONTROL MECHANISM FOR PUMP 
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Ex. 3—Flow Through a Bernouilli Tube 
is punched 
FLOW THROUGH A BERNOUILLI TUBE 
Ex. 4—Synthesis of Iranian-Type Oil 
is punched 
SYNTHESIS OF IRANIAN@TYPE OIL 

3.1.2 Whenthereis more than one inventor (3), the inventors’ names are separated by periods. 

When the inventor is deceased, the name(s) of his representative(s) and the representative(s)’s title 
(such as executor or administratrix) are included in the patent heading. These names are punched ac- 
cording to the following format: inventor’s name followed by a **/ followed by the name of the repre- 
sentative. The representative’s title must begin with a lower case letter. 

When the inventor has changed his name, his newname is punched followed by ** / FORMERLY followed 
by his original name. 

There are no spaces on either side of the **/ as follows: 

Ex. 1—John Doe and James Roe; Mary Doe executrix of the estate of John Doe 
is punched 

“JOHN *DOE**/*MARY *DOE EXECUTRIX. *JAMES *ROE 

Ex. 2—John Doe; Mary Roe, administratrix of the estate of James Roe 
is punched 

“JOHN “DOE. *JAMES *ROE**/*MARY *ROE ADMINISTRATRIX 

Ex. 3—James Roe, now by change of name James T. Roe Co. 
is punched 

“JAMES *T. *ROE *CO.**/FORMERLY *JAMES “ROE 

Ex. 4—Henry Phillips and William E. Hunt, deceased, by Josephine Hunt and Annie Boswell, executrices 
is punched 


72 


=] 


“HENRY *PHILLIPS. *WILLIAM *E. *HUNT**/*JOSEPHINE “HUNT AND “ANNIE *BOSW 


ELL EXECUTRICES 
The assignment information, application date, and serial number are not copied. 
3.1.3 The last character of each ofthe following items of information: (1) patent number, (2) title, and 
(3) inventors’ names, etc., is separated from the first character of the next item of information by a 
minor division symbol 


space / space 
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Thus, the heading of patent 2,765,622 which reads as follows: 
CONTROL MECHANISM FOR PUMP AND MOTOR FLUID SYSTEM. Don 
R. Hill and Ernest C, Chasser, Warren, Ohio; Martha H. Hill, executrix of 
said Don R. Hill, deceased. Application April 27, 1950, Serial No. 158,374. 
is punched 
1 72 
2,765,622 / CONTROL MECHANISM FOR PUMP AND MOTOR FLUID SYSTEM / *DON *R. 
“HILL**/*MARTHA *H. *HILL EXECUTRIX. “ERNEST *“C. *CHASSER / 
3.2 The beginning of each paragraph is indicated by a notation which furnishes the column number in 
which the paragraph occurs and the number of the line on which it begins. The column number is pre- 


ceded by a * and is followed by a @ (a hyphen on the Fortran Punch) and the line number, e.g.: 


a paragraph beginning in column 2 at 
line 47 with the word The is punched 


*2@47 “THE 


3.3 Formulas (chemical or mathematical), tabular material, and diagrams which appear in the patent text 
are not copied. When such material is encountered, the begin formula symbol 


**R 


is punched and nothing more is punched inthe remainder of the card. The formula, table, or diagram is 
encircled in red on the patent text. The next card is begun with the end formula symbol 


**G space 


and punching of the patent text is continued. Kor example, lines 75 through 81 of column 4 of 2,706,891 
which read as follows: 


of such end 116. Assuming that the area of end 116 is, say, seven times the area of the 
end 118 of piston 112, by the formula 


fade 
Hak 
the pressure developed on the fluid in the chamber 119 


are punched 


72 


o—w 


F SUCH END: 116. “ASSUMING THAT THE AREA OF END 116 IS, SAY, SEVEN TIMES 
THE AREA OF THE END 118 OF PISTON 112, BY THE FORMULA **F 
*"*G THE PRESSURE DEVELOPED ON THE FLUID IN THE CHAMBER 119 


Ending the punched text on a card with the begin formula symbol will facilitate insertion of the formu- 
la, table, or diagram at a future time. 


3.4 When a subtitle occurs in a column, the subtitle is introduced by the begin subtitle symbol 


**N) 
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and the subtitle is copied in lower case letters regardless of the kind of type in which the subtitle appears 
in the text. The subtitle is followed by the end subtitle symbol 


**D 
For example, the following lines from column 3 of 2,706,891 
actuation thereof. 
Operation 
line 9 Assuming that the plunger 163 of the hydraulic press 133 is in the down position, 

are punched 

il 72 

ACTUATION THEREOF. **NOPERATION* *P *3@9 “ASSUMING THAT THE PLUNGER 163 0 

F THE HYDRAULIC PRESS 133 IS IN THE DOWN POSITION, 


3.5 Claims, being paragraphs, are introduced with the paragraph notation (see 3.4 above) before the 
claim number. The period after the claim number is treated as a period after an abbreviation, e.g.: 


claim 4 which begins in column 6 at line 29 
is punched 
| 
"6@29 4**. *THE HYOR... 
3.6 At the end of the claims, the begin references cited symbol 
* AT) 

is punched. The words “References cited in the file of this patent" and similar words are not punched. 
The references are punched in the order in which they appear in the patent. 

A single virgule / is used as a minor division for separating individual references from each other. 
Two consecutive virgules // are used as a major division for separating groups of references. Appli- 
Cation of these notations is amplified below. 

The United States patents are punched as follows: 
number space inventor space date/number space inventor space date/number ...../...efC... 

The major division symbol // follows the last character punched for the last United States patent. 

The foreign patents are punched as follows: 
number space country space date/number space country space date/number ..... f--CtCncs 
The major division symbol // follows the last character punched for the last foreign patent. 

Other reference material is punched as though it were regular patent text. When there is more than 
3 Such reference, one virgule / is punched at the end of each reference except the last of such 
references. 


When no U. S. patents are references, two virgules are punched after the **D and precede the first 
foreign patent, as follows: 


**D// 
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tt 


a te ea 


When no U. S. or foreign patents are references, two virgules are punched to signify the absence of 
each group of patents and precede any other reference material, as follows: 


**D//// 
After all of the references are punched, the end references cited symbol 


ee 


“is punched. 


When no references are cited, the begin references cited symbol followed by four virgules followed 
by the end references cited symbol are punched, as follows: 


ADH fifo *E 
Thus, the following text 
References Cited in the file of this patent 


UNITED STATES PATENTS 


Number Name Date 

2,234,215 Youker................ Mar. 11, 1941 
2,376,350 Fryling.... ... May 22, 1945 
2,469,017 Sundet................. May 3, 1949 


OTHER REFERENCES 


India Rubber World, July 1949, page 476. 
Kluchesky et al., Ind. and Eng. Cheml., vol. 41, No. 8, August 1949, pp. 1768-1770. 


is punched 
1 72 


**02,234,215 *YOUKER *MAR**. 11, 1941/2,376,350 “FRYLING *MAY 22, 1945/2 
,469,017 “SUNDET *MAY 3, 1949////*INDIA *RUBBER “WORLD, *JULY 1949, PAGE 
476./*KLUCHESKY ET AL**., *IND"*. AND *ENG**. “CHEML**., VOL**. 41, *NO* 
*. 8, “AUGUST 1949, PP**. 1768@1770.**E 


3.7 If a patent has a Certificate of Correction, the text of the patent is first corrected by hand according 
to the Certificate, and the text is then punched as though it had been originally issued without the error 
which the Certificate corrected. 


3.8 After each patent, one blank card is inserted. 
Section 2.2 


3.9 No provisions are presently included for punching Reissue patents or Reissues of Reissue patents. 
The following portions of patent text are specifically excluded from present consideration: 
Drawings 
Preamble 
Application number and date 
Assignment information 
Classification 
Foomotes 


Symbology and instructions for these areas will be introduced subsequently if needed. 
[See Section 3.3 with respect to formulas]. 


4.0 SYMBOL DICTIONARY 
Table 5 is a composite of (a) several sets of symbols which are expedient for use as identifying de- 
vices and (b) equivalents for those of the symbols which have been selected for use in the current patent 


text punching project. 

The symbols are given in the standard IBM 024 and Fortran Punch characters. The meanings for the 
symbols which have been used in this manual are given in the third column. Reference is made in the 
right-hand column to the Section and Subsection in which the symbol is discussed in the foregoing part 


of this report. 
The remaining symbols or other synthesized sets of symbols will be used if needed for data not 


presently provided for herein. 


TABLE 5.—SYMBOL DICTIONARY 


IBM 704- STANDARD REF- 
FORTRAN IBM 024 SYMBOL MEANING ER- 
SYMBOL SYMBOL ENCE 
NO PUNCH 
space space space between words 2.6 
blank card blank card end of document 2.6;3.8 
SINGLE UNITS 
1 1 1 
2 2 2 
3 3 3 
3 : * ) Arabic numerals 2.4 
9 9 9 
0 0 0 
A A a 
B B b 
Cc (®) c 
: 0 - ) lower case letters 2.3 
Z Z Z 
. Space 2 end of sentence 2.7 
6 decimal point Dati 
= @ minus; hyphen; long dash 2.14 
( g, (start parentheses 2.14 
) pm ) close parentheses 2.14 
/ / in fractions; in and/or; 2.14 
+ division mark; 3.6 


separates references 


=F = 


ee rereerereerleereeerrereorrreoreereorreeeeeere ee 


TABLE 5.--SYMBOL DICTIONARY --Con. 


IBM 704- STANDARD REF- 
FORTRAN IBM 024 SYMBOL MEANING ER- 
SYMBOL SYMBOL ENCE 
SINGLE UNITS--Con. 
space / space space / space separates items in patent heading Sol 
= equals 2.14 
» comma 2.14 
$ dollar sign 2.14 
+ plus 2.14 
PRECEDED BY A SINGLE STAR 
* space cancels * and all space to the next | 2.16 
non-space 
*A A 
*B B 
*¢ Cc 
upper case letters 2.3 
*Z, Z 
*1 
2) 
*3 
not to be used 
4) 
*0 
*col. no. - line | * col.no. @ 
no. space line no. space 
e.g.: e.g.: 
*1-1 space *1@1 space 
* -| * 
pL ojspace AGES beginning of paragraph; the num- 
*1-26 space *1@ 26 space ber after the * is the column num- 
A 5 ber and the number after the @ is 322 
the number of the line on which 
: 2 the paragraph begins. 
*2-4 space *2@4 space 
*24-65 space *24@65 space 
ee *@ subscript 2.12 
( +7, C begin bracket 2.14 
*) + ‘] close bracket 2.14 
= pps 


TABLE 5.+SYMBOL DICTIONARY --Con. 


STANDARD REF - 
IBM 024 SYMBOL, MEANING ER- 
SYMBOL ENCE 


PRECEDED BY A SINGLE STAR--Con. 


taf begin Roman numeral 
*H begin italics or underscoring 
ois ‘; end Roman numeral 
*$ *$ end italics or underscoring 
sa *& superscript 
os ba not to be used 
= PRECEDED BY TWO STARS 

*eA SrA prime; apostrophe; single quote 2.11 
**B “dé mark for which no provision made | 2.15 
sah ©, aC; : colon 2.14 
4D, *=D begin references cited 3.6 
Ed a stad BS end references cited 3.6 
nad Es ved 5 begin formula 3.3 

**G space **G space end formula 3.3 
4 5 +*H or ellipsis 2.7.6 

OK 
I EAT ? question mark 2.14 
ae) LEAL (available for later use) 
aK wd % per cent sign 2.14 
mele LAL Z angle mark 2.14 
**M 7M X multiplication sign 2.14 
EN **N begin subtitle 3.4 
8X0) FXO) not to be used 
pee by aD end subtitle 3.4 
710) **Q "begin quotation 2.10 
**R aR (available for later use) 
eS aS ; semicolon 2.14 
**T hae (available for later use) 
AAG) *2U] "end quotation 2.10 
LAY +xy | direction sign down 2.14 
**W +eW { direction sign up 2.14 
mex EX: ! exclamation point 2.14 
Sty, SENG lower case Greek letter 2.13 
¥47Z, A upper case Greek letter 2.13 
+e **@ (available for later use) 
23 


TABLE 5—SYMBOL DICTIONARY --Con. 


IBM 704- STANDARD REF- 
FORTRAN IBM 024 SYMBOL MEANING ER- 
SYMBOL SYMBOL ENCE 


PRECEDED BY TWO STARS--Con. 


begin capitalized word(s) 2.3 
end capitalized word(s) 2.3 


separates the name of the inventor | 3.1.2 
from the name of his representa- 
tive 


begin bold-face type 2.8 
— direction sign right 2.14 

end bold-face type 2.8 
——direction sign left 2.14 


end of abbreviation; period after 227k3: 
claim number 3.5 


end of abbreviation which is end 2.7.4 
of sentence 


MISCELLANEOUS 


*+0 *&0 ° degree or temperature mark; 
angle mark 


// // end of U. S. patent references; 
no U. S. patent references cited 


/1/1 /1// no U. S. and no foreign patent 
references cited 


**ATHA SRAS eA "' double prime 
**ASF ASFA THASHASEA '" tripe prime 


5.0 THE IBM 704 PROGRAMS 


5.1 Two programs nave been prepared at the Massachusetts Institute of Technology. The first program 
takes the text from the punched cards, closes up the excess spaces which were introduced, and trans- 
literates this material into a standardized output format on an IBM704 tape. The second, or search, 
program locates, in context, any desired text word or words in this tape and reproduces it (them) close 
to the center of approximately 100 machine words, in a format suitable for mounting on a 3 x 5 card. 


5.2 The 36 binary digit (bit) word ofthe IBM 704 is divided for the present project into six Binary Coded 
Decimal (BCD) characters.’ The most convenient arithmetic operations are performed on entire ma- 
chine words. This suggests that, for a high-speed search for a text word or a group of text words, the 
H oe ited should first be arranged in a regular format with respect to the beginnings and ends of ma- 
e words. 


5.3 The following machine word format is used: 


Each text word begins in the second BCD position in a machine word, the first position being filled 

with a blank. If the text word is more than 5 letters in length, it continues into the next machine word, 

the sixth letter occupying the first BCD position and the remaining letters following consecutively. A 

} text word is terminated by filling the remaining positions of the last machine word into which it ex- 
i tends, if any such positions remain, with blanks. 
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Numbers, in general, have the same word format as text words. 

Exception: The patent number and the column-line paragraph notation. These are exemplified in 
Section 5.5 below. 

Sample output word formats are presented in Table 6 together with their text and Fortran equivalents. 


Table 6.--Output Word Formats 


MACHINE WORD 
IBM 704-FORTRAN BCD Positions 
ee SYMBOL 
1 
THIS T 
apertures APERTURES A 
R 
aperture near APERTURE NEAR A 
R 
N 
+ 
valve 101 VALVE 101 l Vv 
1 


5.4 Punctuation marks and other typographic symbols are subdivided into four categories. 


5.4.1 Each of the first group of punctuation marks and other symbols occupies an entire machine 
word. This group is composed of two types of notations: 

(a) Simple punctuation marks. The machine word for each of these marks begins in the third BCD 
position. The word formats for this group are shown in Table 7. 


Table 7.--Word Formats 
Symbols which occupy an entire machine word: 
(a) Simple punctuation marks 


MACHINE WORD 
IBM 704-FORTRAN BCD Positions 
SYMBOL SYMBOL, MEANING 


period (end of sentence) 
ellipsis 

exclamation point 
question mark 


comma 


semicolon 

colon 

per cent 

virgule (minor division) 


two virgules (major division) 


EE L<_ <<< — 


(b) Pairs of punctuation marks and other symbols for which there are complementary notations for 
the commencement and the termination of the condition represented by the symbol pairs. The machine 
word for the “begin” mark ofthe pair starts in the fifth BCD position. The “end” mark of the pair begins 
with a ) in the third BCD position. 

Exception: The “begin parentheses” machine word occupies only the sixth BCD position. 

The word formats for this group are shown in Table 8. 


Table 8.--Word Formats 
Symbols which occupy an entire machine word: 
(b) Pairs of punctuation marks 


MACHINE WORD 
IBM 704-FORTRAN BCD Positions 
SYMBOL SYMBOL, MEANING 
4 5 6 
( begin vets parentheses 
end 
, ( [begin italics 
I end 
ma Q ( Ir begin quotes 
AU Bi Q end 
ee 
B ( begin bold face 
**$ ) B end 
“* 
( Cc ( begin capitalized word(s) 
fd) ( Cc end 
* 
/ e N ( ae Roman numerals 
* 
: ( = subtitle 
end 
a H ( oon formula 
ASG a end 
* 
( K ex begin brackets 
*) ) K end 
eee be 
A ( peein references cited 
) R end 
E ( detected error isolating 
) E | symbol; see Section 5.8 


5.4.2 Each mark of the second group is part of a machine word. The BCD position in which the mark 
is noted in the output tape depends on the text word in which the mark is contained. The marks of this 
group are the 


abbreviation period 

period non-space (e.g., a decimal point ina number: 1.259) 
comma non-space (e.g., in a number: 1,259) 

apostrophe 
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In the output, both periods appear as periods. However, both the comma and the apostrophe appear as 


commas. 
Examples of machine words containing the symbology for these marks and the corresponding text 


words are given in Table 9. 


Table 9.--Word Formats 


Periods and Commas 


MACHINE WORD 
BCD Positions 


IBM 704-FORTRAN 


aoe SYMBOL 
etc. ELC**, 
3,176.45 | 3,176.45 
76.45 76.45 
an arm's length AN ARM**AS LENGTH 


5.4.3 Two symbols which occupy entire machine words but differ because of word format or substance 
from the foregoing two groups are given in Table 10. 


5.4.4'The last category is a residual one for all other marks and symbols. There is no translitera- 
tion for these marks; they appear in the machine word output in IBM 704-Fortran symbology. They oc- 
cupy parts of machine words in the same manner as the marks of group (2) when they are parts of words 
in patent text. They start mew machine words which begin in the second BCD position only when they 
are preceded by a space in patent text. The series of symbols which constitute a single notation appear 
in consecutive positions in a machine word rather than in new machine words. Illustrative word formats 
are shown in Table 11. 


Table 10.--Word Formats 
Special Group 


MACHINE WORD 


SYMBOL 
aa letter following is upper case 


Se) undefined symbol 


ao) s 


Table 11.—Word Formats 
Illustrations for All Other Marks 


MACHINE WORD 


IBM 704-FORTRAN BCD Positions 
NESS SYMBOL 
450° 450*+0 
Z abc **17 -ABG 


5.5 The patent number and column-line paragraph notation always occur together. In transliterated 
form, the patent number follows a ( which appears in the third BCD position of the machine word; re- 
maining spaces of the second machine word into which the number extends, if any such spaces remain, 
are filled with blanks. The column-line paragraph notation starts in the fifth BCD position and is fol- 
lowed by a ) and the remaining spaces of the machine word are filled with blanks. The program for 
creating the IBM 704 tape inserts the patent number and paragraph notation at intervals of about 120 
machine words in long paragraphs of patenttext. The repeated number differs from its first occurrence 
by the addition of a + in the fourth BCD position immediately preceding the paragraph notation. 

The machine words for patent number and paragraph notation for patent 2,709,339 and paragraph be- 
ginning in column 2 at line 51 are shown in Table 12. 


Table 12.—Word Formats 
Patent and Paragraph Numbers 


MACHINE WORD 
BCD Positions 


4 ]s | 6 


First occurrence 


2 ; 7 
3 3 9 
Repeated occurrence + % 


5.6 A graphic description of the machine-word structure of the transliterated text, as it is stored on 
magnetic tape, is given in Figure 1. In the description, the characters are replaced by BCD codes and 
the direction of the tape is toward the bottom of the page. The description is for the following text from 
patent 2,709,339 which begins a paragraph starting in column 2 at line 66: 


The present invention contemplates a PUMP capable of developing pressures as high as 


5000 p.s.i., while operating (with a suitable fluid) at temperatures up to 450° Fahrenheit; the 
two-stage pumping system is shown in Fig. 2' in which D at point 36a stands for “LOAD.’ 
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BCD Positions-- 
Con. 


pif ets feiss | 
R A T U R E 


‘here fee a Re Gs 


2 
3 


GRAPHIC DESCRIPTION OF TRANSLITERATED TEXT 
Figure 1 
5.7 An example is shown in Figure 2 of an output format of the search program for the phrase 
aircraft landing gear 
The program provides a print-out of 48 columns (8 machine words) and 16 lines, with the phrase near 


the middle, so that the phrase can be read in context. The ninth line begins with a mnemonic of the 
phrase, in this case 


ALG 


immediately preceded by an * and followed by a ) and then the phrase itself. 
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tS 


MACHINE WORDS 


2 3 4 5 6 7 8 
THE PILOT <i “|i ie eS} PRESS 
SENSITIVE ‘ RESPONDING TO THE 
OUTPUIT PRESS|URE OF PUMP salt (e 
; WHICH) WILL TEND | TO VARY WITH | CHANG 
ES IN (2,7/09,339 +2-151) THE LOAD 
DEMAND 5 *| FOR EXAMPILE ; 
ASSUMING THAT THE LOAD | COMPRISES THE 
HYDRAULIC ACTUAITING CYLINDERS OF AN 
Phrase searched} * ALG) AIRCRIAFT ‘) LANDING GEAR , CONNE 
CTED TO THE PUMP I|NG SYSTE|M BY 
A SUI TAIBLE CONTRIOL VALVE (| NOT 
SHOWN WHICH] IS OPENED WHEN THE 
PILOT! DESIRIES TO ACTUATE THE LANDI 
NG GEAR ; THE NORMAIL CONDIITION 
WILL | BE ONE IN WHICH) THE VALVE] IS 
CLOSED ; CAUS ING | THE PRESSURE 


OUTPUT FORMAT OF SEARCH PROGRAM 
Figure 2 


5.8 The program for transliterating the punched card “alphabet” to the IBM 704 tape symbology also 
detects and displays errors in punching. The detected error isolating symbols are shown in Table 8. 
The display begins with the symbol 


E( 
which is consecutively followed by 
(1) a description of the type of error 
(2) three blocks of five characters each in BCD positions 2 through 6, each block preceded by a 
virgule / in the first position 
(3) a virgule / in the first position of the following machine word and the symbol )E in positions 3 
and 4. 


A sample display on detection of the improper punching of patent number 2,364,357 as 2,364K357 is 
illustrated in Figure 3. 


MACHINE WORDS 


1 2 3 4 5 6 7 8 


123456)123456/123456/123456/123456/123456)123456)123456 


* SAMPLE ERROR DISPLAY 
Figure 3 


sail 
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